<<<<<<< HEAD The Data Enthusiast Blog - R Workshop The Data Enthusiast – R Workshop

R Workshop

Fun Fact

  • If you’re someone who likes to use



to run data analysis, you’re using already “using”

Downloading R

======= pre > code.sourceCode > span { display: inline-block; text-indent: -5em; padding-left: 5em; } } pre.numberSource code { counter-reset: source-line 0; } pre.numberSource code > span { position: relative; left: -4em; counter-increment: source-line; } pre.numberSource code > span > a:first-child::before { content: counter(source-line); position: relative; left: -1em; text-align: right; vertical-align: baseline; border: none; display: inline-block; -webkit-touch-callout: none; -webkit-user-select: none; -khtml-user-select: none; -moz-user-select: none; -ms-user-select: none; user-select: none; padding: 0 4px; width: 4em; } pre.numberSource { margin-left: 3em; padding-left: 4px; } div.sourceCode { color: #f8f8f2; } @media screen { pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; } } code span { color: #f8f8f2; } code span.al { color: #f07178; background-color: #2a0f15; font-weight: bold; } code span.an { color: #d4d0ab; } code span.at { color: #00e0e0; } code span.bn { color: #d4d0ab; } code span.bu { color: #abe338; } code span.cf { color: #ffa07a; font-weight: bold; } code span.ch { color: #abe338; } code span.cn { color: #ffd700; } code span.co { color: #f8f8f2; font-style: italic; } code span.cv { color: #ffd700; } code span.do { color: #f8f8f2; } code span.dt { color: #ffa07a; } code span.dv { color: #d4d0ab; } code span.er { color: #f07178; text-decoration: underline; } code span.ex { color: #00e0e0; font-weight: bold; } code span.fl { color: #d4d0ab; } code span.fu { color: #ffa07a; } code span.im { color: #abe338; } code span.in { color: #d4d0ab; } code span.kw { color: #ffa07a; font-weight: bold; } code span.op { color: #ffa07a; } code span.ot { color: #00e0e0; } code span.pp { color: #dcc6e0; } code span.re { color: #00e0e0; background-color: #f8f8f2; } code span.sc { color: #abe338; } code span.ss { color: #abe338; } code span.st { color: #abe338; } code span.va { color: #00e0e0; } code span.vs { color: #abe338; } code span.wa { color: #dcc6e0; }

R Workshop

Fun Fact

  • If you’re someone who likes to use



to run data analysis, you’re using already “using”

Downloading R

>>>>>>> 22b2afc (Commit)

You’ll want to download R first before downloading any additional software

You can download the newest version of R (4.2) here

<<<<<<< HEAD

Downloading RStudio

=======

Downloading RStudio

>>>>>>> 22b2afc (Commit)

RStudio is the last software program you’ll need to get started

You can download the newest version of RStudio (Dec 2022) here

<<<<<<< HEAD

R “Quirks”

=======

R “Quirks”

>>>>>>> 22b2afc (Commit)
  • R is case sensitive so what your spelling and the case you use
    • Case =/= case
    • <<<<<<< HEAD
  • =======
>>>>>>> 22b2afc (Commit)
  • R hates spaces for variable. It will not run with a space
    • variable_1 is a GOOD variable name
    • variable 1 is a BAD variable name
    • <<<<<<< HEAD

Downloading Materials For Day 1

You can download all the materials for Day 1 of this workshop here * You want the correlation.qmd, data_clean.qmd, ttest.qmd, and regression.qmd files

Installing and Loading Packages

# To Install a Package
install.packages("tidyverse")

# To Load a Package
library(tidyverse)
=======

Downloading Materials For Day 1

You can download all the materials for Day 1 of this workshop here * You want the correlation.qmd, data_clean.qmd, ttest.qmd, and regression.qmd files

Installing and Loading Packages

# To Install a Package
install.packages("tidyverse")

# To Load a Package
library(tidyverse)
>>>>>>> 22b2afc (Commit)
  • You only have to install a package once (one exception)

  • You must load a library every time you open an R file (.R, .qmd, .rmd, etc) or restart R/RStudio

<<<<<<< HEAD
Important
=======

Important

>>>>>>> 22b2afc (Commit)

A new install of R will remove all installed packages. You must either re-install the packages or save them prior to a new R installation. I’ll cover how to save them at a later date

<<<<<<< HEAD

Importing Data

=======

Importing Data

>>>>>>> 22b2afc (Commit)
  • R works best with csv files (smaller in size) but it will take .sav files (SPSS) and other file formats as well (e.g., .tsv)

  • Oh and obviously it can read Microsoft Excel files

<<<<<<< HEAD
# For CSV

data <- read.csv("file_name.csv")

# For TSV

data <- read_tsv("file_name.tsv")

# For SAV

library(haven)
data <- read_sav("file_name.sav")

# For EXCEL

library(readxl)
data <- read_xlsx("file_name.xlsx")
data <- read_xls("file_name.xls")
=======
# For CSV

data <- read.csv("file_name.csv")

# For TSV

data <- read_tsv("file_name.tsv")

# For SAV

library(haven)
data <- read_sav("file_name.sav")

# For EXCEL

library(readxl)
data <- read_xlsx("file_name.xlsx")
data <- read_xls("file_name.xls")
>>>>>>> 22b2afc (Commit)
<<<<<<< HEAD

Variable Types

=======

Variable Types

>>>>>>> 22b2afc (Commit)
  • Numerical
    • A positive or negative number between (- \(\infty\), \(\infty\))
    • <<<<<<< HEAD
  • Integer
    • A positive or negative whole number between (- \(\infty\), \(\infty\))
  • Factor
    • A grouping category
  • Character
    • A text string
  • Logical
    • A TRUE or FALSE value (e.g., Is X > 1?)
  • Date
    • Exactly what you think it is

{dplyr} R package

Uses for the dplyr package

=======
  • Integer
    • A positive or negative whole number between (- \(\infty\), \(\infty\))
  • Factor
    • A grouping category
  • Character
    • A text string
  • Logical
    • A TRUE or FALSE value (e.g., Is X > 1?)
  • Date
    • Exactly what you think it is
  • {dplyr} R package

    Uses for the dplyr package

    >>>>>>> 22b2afc (Commit)
    • Primary use is to transform and manipulate data in a data set
      • Calculate means, log transform, compute basic summary statistics
      • <<<<<<< HEAD
    • Anyone here who has maybe used database data will see it mimics SQL programming
    Note

    For anyone who might work with database data, you can pull data from external databases with R and RStudio. We won’t cover that in this workshop but you can do it

    Live-ish Coding

    • Open the data_clean.qmd file provided. We’re going to walk through some example code and then live code a little bit

    {stringr} R package

    Uses for the stringr package

    =======
  • Anyone here who has maybe used database data will see it mimics SQL programming
  • Note

    For anyone who might work with database data, you can pull data from external databases with R and RStudio. We won’t cover that in this workshop but you can do it

    Live-ish Coding

    • Open the data_clean.qmd file provided. We’re going to walk through some example code and then live code a little bit

    {stringr} R package

    Uses for the stringr package

    >>>>>>> 22b2afc (Commit)
    • Primarily used for dealing with character or string data
      • Useful for free response questions
      • Essentially it’s the dplyr package for string variable types
      • <<<<<<< HEAD

    Live-ish Coding

    • Go back to the data_clean.qmd file provided. We’re going to walk through some example code and then live code a little bit

    {lubridate} R package

    Uses for the lubridate package

    =======

    Live-ish Coding

    • Go back to the data_clean.qmd file provided. We’re going to walk through some example code and then live code a little bit

    {lubridate} R package

    Uses for the lubridate package

    >>>>>>> 22b2afc (Commit)
    • Primarily used for dealing with dates
    • Provides handy function for converting date formats into other date formats
      • E.g., (MM-DD-YY to DD-MM-YY or Month Date, Year)
      • <<<<<<< HEAD
    Tip

    It won’t auto convert your dates to weird incorrect formats like certain spreadsheet programs might.

    Live-ish Coding

    • Go back to the data_clean.qmd file provided. We’re going to walk through some example code and then live code a little bit

    {ggplot2} R package

    Uses for the ggplot2 package

    =======

    Tip

    It won’t auto convert your dates to weird incorrect formats like certain spreadsheet programs might.

    Live-ish Coding

    • Go back to the data_clean.qmd file provided. We’re going to walk through some example code and then live code a little bit

    {ggplot2} R package

    Uses for the ggplot2 package

    >>>>>>> 22b2afc (Commit)
    • This may be the most popular package download in R and it’s probably not close
    • This is THE visualization package in R. If you can THINK of a graphic, this package can create it
      • E.g. box plots, box and whisker plots, violin plots, bar graphs, etc
      • <<<<<<< HEAD
    • If you’re REALLY good, you can do this (credit @ralitza_s)

    “Live”ish Coding

    • Go back to the data_clean.qmd file provided. We’re going to walk through some example code and then live code a little bit

    Exporting Data

    =======
  • If you’re REALLY good, you can do this (credit @ralitza_s)
  • “Live”ish Coding

    • Go back to the data_clean.qmd file provided. We’re going to walk through some example code and then live code a little bit

    Exporting Data

    >>>>>>> 22b2afc (Commit)
    • Sometimes you want or need to export data you’ve cleaned to another program. Maybe you want to use a program like JASP

    • Or you’re not comfortable using R for analyses yet so you want to use SPSS

    • Or maybe others on your team use a different program

    • R can export to Excel, SPSS, SAS, and CSV files

    <<<<<<< HEAD
    library(openxlsx)
    
    write.xlsx(df,file = "filename.xlsx")
    library(haven)
    
    # For SPSS
    write_sav(df, path = "filename.sav")
    
    # For SAS
    write_sas(df, path = "filename.sas")
    write.csv(df, file = "filename.csv")
    =======
    library(openxlsx)
    
    write.xlsx(df,file = "filename.xlsx")
    library(haven)
    
    # For SPSS
    write_sav(df, path = "filename.sav")
    
    # For SAS
    write_sas(df, path = "filename.sas")
    write.csv(df, file = "filename.csv")
    >>>>>>> 22b2afc (Commit)
    <<<<<<< HEAD

    Lunch Break

    • Back at 2pm

    Putting It All Together From Start To Finish

    Statistical Analyses in R

    Correlation

    Assumptions (Fields et al, 2012)

    • On at least an interval scale
    • Normality of Residuals

    Live-ish Coding

    • Please see correlation.qmd file provided

    T Tests

    Assumptions (Fields et al, 2012)

    =======

    Lunch Break

    • Back at 2pm

    Putting It All Together From Start To Finish

    Statistical Analyses in R

    Correlation

    Assumptions (Fields et al, 2012)

    • On at least an interval scale
    • Normality of Residuals

    Live-ish Coding

    • Please see correlation.qmd file provided

    T Tests

    Assumptions (Fields et al, 2012)

    >>>>>>> 22b2afc (Commit)
    • Normality of Residuals
    • Independent Observations
    • Homogeneity of Variance
    • <<<<<<< HEAD

    Live-ish Coding

    • Please see the ttest.qmd file provided

    Regression

    Assumptions (Fields et al, 2012)

    =======

    Live-ish Coding

    • Please see the ttest.qmd file provided

    Regression

    Assumptions (Fields et al, 2012)

    >>>>>>> 22b2afc (Commit)
    • Outliers and Influential Cases
    • Normality of Residuals
    • Independent Observations
    • Homogeneity of Variance
    <<<<<<< HEAD
    Important

    While important, outliers and influential cases rarely influence results with a sufficient sample size. Also difficult to say what “is” and “isn’t” an outlier. Outlier shouldn’t always mean removal

    Live-ish Coding

    • Please open the regression.qmd file provided

    End of Day 1

    Downloading Material For Day 2

    You can download all the materials for Day 2 of this workshop here * You want the anova.qmd, nonparametric.qmd, intro_qarto.qmd, mlm.qmd, sem.qmd and factor_analysis.qmd

    ANOVA: Including Repeated Measures & Factorial

    Assumptions (Fields et al, 2012)

    =======

    Important

    While important, outliers and influential cases rarely influence results with a sufficient sample size. Also difficult to say what “is” and “isn’t” an outlier. Outlier shouldn’t always mean removal

    Live-ish Coding

    • Please open the regression.qmd file provided

    End of Day 1

    Downloading Material For Day 2

    You can download all the materials for Day 2 of this workshop here * You want the anova.qmd, nonparametric.qmd, intro_qarto.qmd, mlm.qmd, sem.qmd and factor_analysis.qmd

    ANOVA: Including Repeated Measures & Factorial

    Assumptions (Fields et al, 2012)

    >>>>>>> 22b2afc (Commit)
    • Normality Within Groups
    • Homogeneity of Variance
    • Independent Observations
    • <<<<<<< HEAD

    Live-ish Coding

    • Please open the anova.qmd file provided

    Non-Parametric Tests

    =======

    Live-ish Coding

    • Please open the anova.qmd file provided

    Non-Parametric Tests

    >>>>>>> 22b2afc (Commit)
    • Wilcoxon Ranked-Sum Test (i.e., Mann–Whitney Test)
      • Non-parametric equivalent of the independent samples t-test
      • <<<<<<< HEAD
    • Wilcoxon Signed-Rank Test
      • Non-parametric equivalent of the dependent sample t-test
    • Kruskal–Wallis Test
      • Non-parametric equivalent of an ANOVA
    • Friedman’s Test
      • Non-parametric equivalent of a repeated measures ANOVA

    Live-ish Coding

    • Please open the nonparametric.qmd file

    EFA & CFA

    EFA Assumptions (Fields et al, 2012)

    • Sufficient Sample Size
    • Normality of Items
    • Correlation Between Items1
    • Appropriate Determinant (Det \(>\) 1 x 10-5)
    Important
    =======
  • Wilcoxon Signed-Rank Test
    • Non-parametric equivalent of the dependent sample t-test
  • Kruskal–Wallis Test
    • Non-parametric equivalent of an ANOVA
  • Friedman’s Test
    • Non-parametric equivalent of a repeated measures ANOVA
  • Live-ish Coding

    • Please open the nonparametric.qmd file

    EFA & CFA

    EFA Assumptions (Fields et al, 2012)

    • Sufficient Sample Size
    • Normality of Items
    • Correlation Between Items1
    • Appropriate Determinant (Det \(>\) 1 x 10-5)

    Important

    >>>>>>> 22b2afc (Commit)
    1. We want variables to correlate however we do not want them to correlate either
      too low (r \(<\) .30) or too high (r \(>\) .80) across multiple items
    <<<<<<< HEAD

    CFA Assumptions

    • Multivariate Normality

    Live-ish Coding

    • Please open the factor_analysis.qmd file

    Lunch Break

    • Back at 2pm

    SEM

    Assumptions To Test (Kaplan, 2001, p. 15218)

    =======

    CFA Assumptions

    • Multivariate Normality

    Live-ish Coding

    • Please open the factor_analysis.qmd file

    Lunch Break

    • Back at 2pm

    SEM

    Assumptions To Test (Kaplan, 2001, p. 15218)

    >>>>>>> 22b2afc (Commit)
    • Multivariate Normality
    • No Systematic Missing Data
    • Sufficiently Large Sample Size
    • Correct Model Specification
    • <<<<<<< HEAD

    Live-ish Coding

    • Please open the sem.qmd file

    MLM (Fields et al, 2012)

    Assumptions To Test

    • Outliers and Influential Cases
    • Normality of Residuals
    • Independent Observations1
    • Homogeneity of Variance
    A Note On Independence
    =======

    Live-ish Coding

    • Please open the sem.qmd file

    MLM (Fields et al, 2012)

    Assumptions To Test

    • Outliers and Influential Cases
    • Normality of Residuals
    • Independent Observations1
    • Homogeneity of Variance

    A Note On Independence

    >>>>>>> 22b2afc (Commit)
    1. This assumption is not necessarily a concern given that MLM assumes observations are nested (Fields et al, 2012)
    <<<<<<< HEAD

    Live-ish Coding

    • Please open the mlm.qmd file

    Quarto: Code + Text

    The Holy Grail of Reproducibility

    • What if I told you that it was possible to generate 95% of what you need for a manuscript within RStudio AND you could integrate your analyses as well?
    • What if I also said you could export this to Microsoft Word?

    Let’s Talk About Quarto

    • Please open the intro_quarto.qmd file

    Final Thoughts

    =======

    Live-ish Coding

    Quarto: Code + Text

    The Holy Grail of Reproducibility

    Let’s Talk About Quarto

    Final Thoughts

    >>>>>>> 22b2afc (Commit)

    Firstly, thank you for your time this weekend and I hope you’ve learned something

    Second, this is A LOT. I crammed stuffed about 2 years of statistical analyses time and practice into like 2 days. It’s okay and normal if you’re swimming. I’m here if anyone has any questions after or even if they’re using R and trying to do an actual analysis in R. People ask me for help all the time. I’m happy to help

    Finally, one last “minor” detail. Some of you already know this but lets just check out the following link

    <<<<<<< HEAD
    ======= }; // fire slideEnter for tabby tab activations (for htmlwidget resize behavior) document.addEventListener("tabby", fireSlideEnter, false); deck.on("slidechanged", function (event) { fireSlideChanged(event.previousSlide, event.currentSlide); }); } function workaroundMermaidDistance(deck) { if (window.document.querySelector("pre.mermaid-js")) { const slideCount = deck.getTotalSlides(); deck.configure({ mobileViewDistance: slideCount, viewDistance: slideCount, }); } } function handleWhiteSpaceInColumns(deck) { for (const outerDiv of window.document.querySelectorAll("div.columns")) { // remove all whitespace text nodes // whitespace nodes cause the columns to be misaligned // since they have inline-block layout // // Quarto emits no whitespace nodes, but third-party tooling // has bugs that can cause whitespace nodes to be emitted. // See https://github.com/quarto-dev/quarto-cli/issues/8382 for (const node of outerDiv.childNodes) { if (node.nodeType === 3 && node.nodeValue.trim() === "") { outerDiv.removeChild(node); } } } } return { id: "quarto-support", init: function (deck) { controlsAuto(deck); previewLinksAuto(deck); fixupForPrint(deck); applyGlobalStyles(deck); addLogoImage(deck); tweakSlideNumber(deck); addFooter(deck); addChalkboardButtons(deck); handleTabbyClicks(); handleSlideChanges(deck); workaroundMermaidDistance(deck); handleWhiteSpaceInColumns(deck); }, }; }; >>>>>>> 22b2afc (Commit)